Previously, I
demonstrated a Debian packaging workflow using Git and
I mentioned the possibility of a follow-up post; well, here it is: you want to
use my workflow (or one that's related) for a package that is currently
maintained with
Subversion on
svn.debian.org and you'd like to keep the history
during the conversion.
Make sure to read the
previous post before this
one.
I am again using the example of
mdadm since its
Git packaging repository is in a state of
shambles and I want to restart to get it right
and import the history from
the previous
Subversion repository.
What better way than to write a blog post as I do so? Well, plenty actually.
This kind of post
isn't really made for a blog, and I have
started work on setting up
ikiwiki on
madduck.net, but it's not yet ready, so I'll stick with the blog
for now. I will make sure that links don't break as I move content over, so
feel free to bookmark this
Importing the package into Git
Thanks to
git-svn, the
initial step of getting your package imported into Git is a breeze:
$ git-svn clone --stdlayout --no-metadata \
svn+ssh://svn.debian.org/svn/pkg-mdadm/mdadm mdadm
Sit back and enjoy. If that command exits prematurely with an error such as
the following:
Malformed network data: Malformed network data at /usr/local/bin/git-svn
line 1029
then you should upgrade to a newer Git version, or have a look
here. If your Git does not know
--stdlayout then upgrade as well (or use
-T trunk -t tags -b branches
instead).
Sam Vilain notes that it is important to "get the attribution right with the
final SVN import - getting the authors map right. I didn't do that. If you
look at the repository resulting from the above command, you'll notice strange
commit authors, such as madduck@some-unique-uuid-from-svn . git-svn
allows you to map these to real names with real email addresses, which ensures
that the attributions are good for the whole world to see.
When done, switch to the repository and run
git-branch -r. As you'll see,
git-svn imported all SVN branches and tags as remote branches. You need
those if you want to bidirectionally track the Subversion repository, but we
are converting, as you may have guessed by the
--no-metadata switch above.
Therefore, we resort to
the Dinosaur method of converting branches to tags, which I'll simplify for
mdadm. We also just delete all remote branches after tagging, since
mdadm never used branches in the
SVN repository. Your mileage may
vary.
git branch -r sed -rne 's, *tags/([^@]+)$,\1,p' while read tag; do
echo "git tag debian/$tag tags/$ tag ^; git branch -r -d tags/$tag"
done
git branch -r while read tag; do
echo "git branch -r -d $tag"
done
If that seems to work alright, then you can execute the commands.
Sam Vilain (again) hints me at git-pack-refs and then to edit
.git/packed-refs with an editor. This certainly leaves more room for
errors but might be significantly faster.
Cleaning up the SVN references
Even though we passed
--no-metadata to
git-svn, it did leave some
traces in
.git/, which we can now safely remove:
$ git config --remove-section svn-remote.svn
$ rm -r .git/svn
Setting things straight
You can skip this section unless you want to know a bit about how to fix up
stuff with Git.
There was actually some nasty tagging errors leading up to the
2.5.6-9
release for
etch and I could never be bothered to fix those in
SVN,
but now I can (I love Git!):
$ git tag -d debian/2.5.6-10 # never existed
$ git tag -f debian/2.5.6-8 2.5.6-8~2 # mistagged
$ git checkout -b maint/etch 2.5.6-8 # this is when we diverged
$ git apply < /tmp/mdadm-2.5.6-8..2.5.6-9.diff
$ git add debian/po/gl.po debian/po/pt.po debian/changelog
$ git commit -s
$ git tag debian/2.5.6-9
Now that that's fixed, there is one other thing to worry about, namely the
very last commit to
SVN, which obsoletes the repository and points to the
Git repository. But that's not all of it. I was also silly enough to include
a fix in the
same commit. Let's see what Git can do. Since the process of
obsoletion involves all but adding a file, we can simply
--amend the last
commit and provide a new log message:
$ git checkout master
$ git rm OBSOLETE debian/OBSOLETE
$ git commit --amend
Now the repository is in an acceptable state.
Making ends meet
The
pkg-mdadm effort on svn.debian.org only
maintained the
./debian/ directory, separate from the upstream code, and boy
was that a bad idea. Just to give one example: think about what's involved in
preparing a Debian-specific patch against the upstream code this has to end,
and we can make it end right here; let's import upstream's code (again not
using his ADSL line, but the
upstream branch of the
pkg-mdadm Git
repository; see the
previous post for
details):
$ git remote add upstream-repo git://git.debian.org/git/pkg-mdadm/mdadm
$ git config remote.upstream-repo.fetch \
+refs/heads/upstream:refs/remotes/upstream-repo/upstream
$ git fetch upstream-repo
$ git checkout -b upstream upstream-repo/master
Now we have two unconnected ancestries in our repository, and it's time to
join them together. The most logical way seems to be to use the last upstream
tag for which we have a Debian tag:
2.6.2.
For this, we branch off the corresponding Debian tag (
2.6.2-1) and merge
upstream's
2.6.2 tag into the new branch. This will be a temporary branch
Then, we rebase (remember, nothing has been published yet) the master branch
on top of this temporary branch, before we end that branch's short life. The
Debian tag stays where it is since it describes the state of the repository at
time of the release of
2.6.2-1.
$ git checkout -b tmp/join debian/2.6.2-1
$ git merge mdadm-2.6.2
$ git rebase tmp/join master
$ git branch -d tmp/join
It just so happens that the head of the
SVN repository, which is identical
to the tip of our
master branch, corresponds to Debian release
2.6.2-2, so we tag it:
$ git tag debian/2.6.2-2
We are now also "born" in the sense that maintenance in Git has started. Let's
mark that point in history. There is no real reason I can foresee for this
yet, but nonetheless:
$ git tag -s git-birth
Turning dpatch files into feature branches
We want to turn
dpatch files into feature branches and we somehow make it
"proper". We could branch, apply the patch, delete the patch file, checkout
master and delete the patch file there as well, but that appears
"improper" to me at least; so instead, we'll cherry-pick:
$ git checkout -b deb/conffile-location
$ debian/patches/01-mdadm.conf-location.dpatch -apply
$ git rm debian/patches/01-mdadm.conf-location.dpatch
$ git commit -s
$ git commit -s $(git ls-files --others --modified)
I should quickly intervene to make sure you are following. I am making use of
Git's index here. Applying the patch makes the changes in the working tree,
but we did not tell Git that we want those to be part of the commit just yet.
Instead, we delete the
dpatch with
git-rm, which automatically
registers the deletion with the index. Thus, the first
git-commit creates
a commit which deletes the
dpatch, while the second
git-commit
creates a commit with all the changes from the
dpatch, using
git-ls-files to identify new and modified files.
But for now, let's move on. We have two commits in the
deb/conffile-location branch, and one of those is relevant to the
master branch, we cherry-pick it:
$ git cherry-pick deb/conffile-location^
If you're confused, let me explain: our goal is to have a number of feature
branches, of which
master is the one in which most of
./debian/ is
maintained. All the branches later come together in the long-living
build
branch, so
deb/conffile-location will never be merged back into
master. However, once we applied the
dpatch to the feature branch, we
can delete it from there and the
master branch. By cherry-picking, we
"import" the deletion to the
master branch.
I repeat the same procedure for
deb/docs, merging all the
documentation-related
dpatches, but I'll spare you the details.
and then Git let me down
In the next step, I found I had misunderstood Git merging: I thought Git was
smart, but Linus had his reasons for calling Git the "stupid content tracker"
(more on that later). Read on as I am obsoleting
dpatch files that
upstream had merged:
99-*-FIX.dpatch.
For consistency, I wanted to cherry-pick each of the appropriate upstream
commits into the
master branch along with deleting the corresponding
dpatch file. Here is one example:
99-monitor-6+10-FIX.dpatch was
obsoleted by upstream's commit
66f8bbb; the
-x records the original
commit ID in the log:
$ git cherry-pick -x 66f8bbb
$ git rm debian/patches/99-monitor-6+10-FIX.dpatch
$ git commit -s -m"remove dpatch obsoleted by $(git rev-parse --short HEAD)"
I repeated the procedure for the other
dpatch files, removed the
dpatch infrastructure, and then went on to merge it all into
build to
build the package.
The
build branch is a long-living branch off
upstream, but which
upstream? I'll fast-forward you past
a segfault problem with mdadm, which
upstream
(thought to have) resolved with commit 23dc1ae after 2.6.3,
but he had not yet released
2.6.4. Looking at the commits between
23dc1ae and upstream's
HEAD at the time, I decided to include them all
and snapshot
4450e59:
$ git fetch upstream-repo
$ git checkout upstream
$ git merge upstream-repo
$ git tag mdadm-2.6.3+200709292116+4450e59 4450e59
$ git checkout master
$ git merge --no-commit mdadm-2.6.3+200709292116+4450e59
$ dch -v mdadm-2.6.3+200709292116+4450e59-1
$ git add debian/changelog
$ git commit -s
And then I called
poor-mans-gitbuild, which merges
master and then
deb/* into
build. Here is when stuff blew up.
I'll make a long story short (read
my description of the problem and
Linus' answer if you want to know more):
I thought Git was smart to identify merges common to both branches and do the
right thing, but it turn out that Git does not care
at all about commits, it
only worries about content and the end result. In our case, unfortunately
(or fortunately), the outcome meant a conflict because the upstream branch
introduced
a simple change (last hunk)
in the lines surrounding the patch we cherry-picked, and Git can't handle it.
The solution is
not to cherry-pick, to cherry-pick
all commits touching
the context of the
dpatch, or to simply merge
upstream into all out
feature branches. In our case, the first is the easiest solution and since
importing
dpatch files is a one-time thing (thank
$DEITY), I'll leave
it at that.
Almost.
I have spent two days thinking about this more than I should have. And it was
this point Linus made which
made me appreciate Git even more:
Conflicts aren't bad - they're good. Trying to aggressively resolve them
automatically when two branches have done slightly different things in the
same area is stupid and just results in more problems. Instead, git tries to
do what I don't think anybody else has done: make the conflicts easy to
resolve, by allowing you to work with them in your normal working tree, and
still giving you a lot of tools to help you see what's going on.
The end
This concludes today's report. Importing the changes from the old Git repo,
tagging and merging the branches is all covered
in my previous post, or at least
you'll find enough information there to complete the exercise.
I would like to specifically thank Sam Vilain and Linus Torvalds for their
help in preparing this post, as well as the
#git/freenode inhabitants, as
always.
If you are interested in the topic of using version control for distro
packaging, I invite you to join
the vcs-pkg mailing list and/or the
#vcs-pkg/irc.oftc.net IRC channel.
Also, if you are interested in Git in general, you can find a
list of blog
posts on the
Git wiki.
NP:
The Police:
Zenyatta Mondatta